Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            360 video streaming presents unique challenges in bandwidth efficiency and motion-to-photon (MTP) latency, particularly for live multi-user scenarios. While viewport prediction (VP) has emerged as the dominant solution, its effectiveness in live streaming is limited by training data scarcity and the unpredictability of live content. We present 360LIVECAST, the first practical multicast framework for live 360 video that eliminates the need for VP through two key innovations: (1) a novel viewport hull representation that combines current viewports with marginal regions, enabling local frame synthesis while reducing bandwidth by 60% compared to full panorama transmission, and (2) an viewport-specific hierarchical multicast framework leveraging edge computing to handle viewer dynamics while maintaining sub-25ms MTP latency. Extensive evaluation using real-world network traces and viewing trajectories demonstrates that 360LIVECAST achieves 26.9% lower latency than VP-based approaches while maintaining superior scalability.more » « lessFree, publicly-accessible full text available August 5, 2026
- 
            Effective training and debriefing are critical in high-stakes, mission-critical environments such as firefighting, where precision and error minimization are paramount. The traditional post-training analysis relies on the manual review of 2D video, a process that is time-consuming and lacks comprehensive situational awareness. To address these limitations, we introduce ACT360, a novel system that leverages 360-degree video and machine learning for automated action detection and efficient debriefing. ACT360 incorporates 360YOWO, a customized You Only Watch Once (YOWO) model enhanced with a spatial attention mechanism and equirectangular-aware convolution (EAC) to handle the unique distortions of panoramic video data. To enable deployment in resource-constrained environments, we apply quantization and model pruning, reducing the model size by 74% while maintaining robust accuracy (mAP drop of only 1.5 %, from 0.865 to 0.850) and improving inference speed. We validate our approach on a new, publicly available dataset of 55 labeled 360-degree videos covering seven key firefighting actions, recorded across various real-world practice sessions and environmental conditions. Furthermore, we integrate the pipeline with 360AIE (Action Insight Explorer), a web-based interface that provides automatic action detection, retrieval, and textual summarization of key events using large language models (LLMs), significantly improving post-incident analysis efficiency. ACT360 serves as a generalized framework for mission-critical debriefing, incorporating techniques such as EAC, spatial attention, summarization, and model optimization. These innovations apply to any training environment requiring lightweight action detection and structured nost-exercise analysis.more » « lessFree, publicly-accessible full text available June 16, 2026
- 
            Multi-camera systems are essential in movies, live broadcasts, and other media. The selection of the appropriate camera for every moment has a decisive impact on production quality and audience preferences. Learning-based multi-camera view recommendation frameworks have been explored to assist professionals in decision making. This work explores how two standard cinematography practices could be incorporated into the learning pipeline: (1) not staying on the same camera for too long and (2) introducing a scene from a wider shot and gradually progressing to narrower ones. In these regards, we incorporate (1) the duration of the displaying camera and (2) camera identity as temporal and camera embedding in a transformer architecture, thereby implicitly guiding the model to learn the two practices from professional-labeled data. Experiments show that the proposed framework outperforms the baseline by 14.68% in six-way classification accuracy. Ablation studies on different approaches to embedding the temporal and camera information further verify the efficacy of the framework.more » « lessFree, publicly-accessible full text available June 23, 2026
- 
            Efficient single instance segmentation is critical for unlocking features in on-the-fly mobile imaging applications, such as photo capture and editing. Existing mobile solutions often restrict segmentation to portraits or salient objects due to computational constraints. Recent advancements like the Segment Anything Model improve accuracy but remain computationally expensive for mobile, because it processes the entire image with heavy transformer backbones. To address this, we propose TraceNet, a one-click-driven single instance segmentation model. TraceNet segments a user-specified instance by back-tracing the receptive field of a ConvNet backbone, focusing computations on relevant regions and reducing inference cost and memory usage during mobile inference. Starting from user needs in real mobile applications, we define efficient single-instance segmentation tasks and introduce two novel metrics to evaluate both accuracy and robustness to low-quality input clicks. Extensive evaluations on the MS-COCO and LVIS datasets highlight TraceNet’s ability to generate high-quality instance masks efficiently and accurately while demonstrating robustness to imperfect user inputs.more » « lessFree, publicly-accessible full text available August 5, 2026
- 
            Neural Radiance Field (NeRF) has emerged as a powerful technique for 3D scene representation due to its high rendering quality. Among its applications, mobile NeRF video-on-demand (VoD) is especially promising, beneting from both the scalability of the mobile devices and the immersive experience oered by NeRF. However, streaming NeRF videos over real-world networks presents signi cant challenges, particularly due to limited bandwidth and temporal dynamics. To address these challenges, we propose NeRFlow, a novel framework that enables adaptive streaming for NeRF videos through both bitrate and viewpoint adaptation. NeRFlow solves three fundamental problems: rst, it employs a rendering-adaptive pruning technique to determine voxel importance, selectively reducing data size without sacricing rendering quality. Second, it introduces a viewpoint-aware adaptation module that eciently compensates for uncovered regions in real time by combining preencoded master and sub-frames. Third, it incorporates a QoE-aware bitrate ladder generation framework, leveraging a genetic algorithm to optimize the number and conguration of bitrates while accounting for bandwidth dynamics and ABR algorithms. Through extensive experiments, NeRFlow is demonstrated to eectively improve user Quality of Experience (QoE) by 31.3% to 41.2%, making it an ecient solution for NeRF video streaming.more » « lessFree, publicly-accessible full text available June 26, 2026
- 
            Free, publicly-accessible full text available August 1, 2026
- 
            Free, publicly-accessible full text available February 26, 2026
- 
            Multi-camera systems are indispensable in movies, TV shows, and other media. Selecting the appropriate camera at every timestamp has a decisive impact on production quality and audience preferences. Learning-based view recommendation frameworks can assist professionals in decision-making. However, they often struggle outside of their training domains. The scarcity of labeled multi-camera view recommendation datasets exacerbates the issue. Based on the insight that many videos are edited from the original multi-camera videos, we propose transforming regular videos into pseudo-labeled multi-camera view recommendation datasets. Promisingly, by training the model on pseudo-labeled datasets stemming from videos in the target domain, we achieve a 68% relative improvement in the model’s accuracy in the target domain and bridge the accuracy gap between in-domain and never-before-seen domains.more » « lessFree, publicly-accessible full text available December 8, 2025
- 
            Free, publicly-accessible full text available December 1, 2025
- 
            Free, publicly-accessible full text available November 4, 2025
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
